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Abstract 

This is an expository paper on approximating functions from general Hilbert or Banach 
spaces in the worst case, average case and randomized settings with error measured in the L p 
sense. We define the power function as the ratio between the best rate of convergence of 
algorithms that use function values over the best rate of convergence of algorithms that use 
arbitrary linear functionals for a worst possible Hilbert or Banach space for which the problem 
of approximating functions is well defined. Obviously, the power function takes values at most 
one. If these values are one or close to one than the power of function values is the same or 
almost the same as the power of arbitrary linear functionals. We summarize and supply a few 
new estimates on the power function. We also indicate eight open problems related to the power 
function since this function has not yet been studied in many cases. We believe that the open 
problems will be of interest to a general audience of mathematicians. 
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1 Introduction 

This is an expository paper on the problem of approximating functions from general Hilbert or 
Banach spaces, which has been thoroughly studied in many books and papers. This problem has 
many variants depending on how we measure the error of such approximations (algorithms). A 
popular choice is to take the norm of an L p space and all values of p £ [1, oo] have been considered. 
Furthermore, the error of algorithms can be defined in the worst case, average case or randomized 
setting. For the worst and average case settings, we consider deterministic algorithms. The worst 
case error is defined as the maximal error over the unit ball of a given space whereas the average 
case error is defined as the average error over the whole space with respect to a given measure. The 
usual choice is a zero mean Gaussian measure. For the randomized setting we consider randomized 
algorithms and the error is defined as the maximal expected error over the unit ball of a given 
space. Here, the expected error is given with respect to a probability distribution of randomized 
elements. 

We approximate functions / by algorithms that use information about / given by finitely many 
functionals of /. Information is called linear if we can choose arbitrary linear functionals, and it 
is called standard if only function values may be used. Clearly, linear information is at least as 
powerful as standard information. For many applications, only standard information is available. 
But even in this case, it is a good idea to study linear information and learn how difficult is the 
function approximation problem. For example, if we can prove that even for linear information the 
problem is too difficult then, obviously, the same also holds for standard information. On the other 
hand, all positive results for linear information do not have to hold for standard information. 

The main question addressed in this expository paper is the study of the power of standard 
information or equivalently the power of function values. We want to know how much we lose if 
function values are used instead of linear information. Or more optimistically, we ask when the 
power of standard information is the same or nearly the same as the power of linear information. 
Such questions have been addressed in a number of papers and we will refer to them in the course 
of this paper. It has been usually done for specific spaces and only a few papers addressed these 
questions for some classes of spaces. 

Our approach is a little more general and we want to verify the power of function values/standard 
information for all Hilbert or Banach spaces for which the problem of function approximation is 
well defined. More precisely, we define the power function 1 

£ sett - x : (0,oo) x [l,oo] -> [0,1]. 

Here sett 6 {wor, ran, avg} denotes the setting we use for the error definition. Hence, wor stands 
for the worst case setting, ran for the randomized setting, and avg for the average case setting. The 
second superscript x G {H, B} tells us if we consider only Hilbert spaces (x = H) or if we allow all 
Banach spaces (x = B). 

We now explain the meaning of the value 

£ sctt - x (r,p). 

The first argument r means that the nth minimal error (formally defined in Definition 1) behaves 
like n~ r if we use linear information. Since r > 0, we consider Hilbert or Banach spaces which 

1 We needed to find a good one-letter name for the power function. Since in English and in Polish this would 
indicate the letter "p" which is already used as the parameter of the L p space, we turn to German and use the word 
"Leistung". That is why the letter i denotes the power function. 
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admit convergence, and furthermore they admit a polynomial rate of convergence of the minimal 
errors. The second argument p denotes the use of the norm of L p . The value £ sett ~ x (r,p) is defined 
as r _1 times the best rate of convergence we obtain using only function values for a worst possible 
choice of a Hilbert or Banach space. That is why £ sett ~ x (r,p) < 1, and the larger £ sett ~ x (r,p) the 
better. Hence, if we have 

£ sett ~ x {r,p) = 1 

then the power of standard information is the same as the power of linear information. We will 
see later that this does happen in some cases. Then standard information yields the same rate of 
convergence as linear information for the embeddings I : F —> L p for all Hilbert (if x = H) or 
Banach (if x = B) spaces without the need of a case to case study for each F. This holds in the 
randomized setting for Hilbert spaces with p = 2, see Theorem El and in the average case setting 
for Banach spaces equipped with zero mean Gaussian measures and p = 2, see Theorem El It is 
open if £ sett ~ x (r,p) = 1 may happen in the worst case setting, see Open Problem 1. 
On the other hand, if we have 

£ sett - x (r,p) = 

then the power of standard information is zero as compared to the power of linear information. 
Finally, if we have 

£ sett - x {r,p) G (0,1) 

then we know qualitatively how much we may lose by using function values. 

The concept of the power function seems to be new. For many values of p, especially when 
p / 2, this function has not yet been studied. This is especially the case for the randomized and 
average case settings. That is why we indicate eight open problems related to the power function 
with the hope that many mathematicians will be interested in solving them and advancing our 
knowledge about the power of function values. 

In this paper, we tried to summarize and supply a few new estimates on the power function. 
We now briefly indicate a few results presented in the paper. 

In the worst case setting for the Hilbert case and p = 2, we conclude from [6l [8] that 

£ wor - H (r,2) = for all r G (0, ±], 

for all r G (|, oo). 

Hence, the power of function values is zero for r < 1/2, and almost the same as the power of linear 
information for large r. One of the main open problem is to verify whether ^ wor_H (r, 2) = 1 for all 
r > 1/2. 

Staying with the worst case and Hilbert spaces but with p ^ 2, we conclude from [18] that 

r OI - R (r,p) = for all r G (o,min(±, \) . 

For r > min(l/p, 1/2), we do not know anything about the values of £ wor_H (r,p) except the case 
p = oo for which we know from [12] that 

£ wor - H / B (r,oo) > 1- -. 



1 wor— H 



(r,2) G 



2r 



2r + l' 
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By H/B we mean that we obtain this result for both Hilbert and Banach spaces. Again for large r, 
the power of standard information is almost the same as the power of linear information. 
For the worst case and the Banach case, we have 

£ wor - B (r,p) = for all r € (0, 1] and pG [1,2], 

r or - B (r,p) = for all r € (0, ± + i] andpG(2,oo), 

£wor-B/ % < 1 — 1(1 — 1] for all r > 1 and p G [1, 2], 
r \ p J 

£ wor - B (r,p) < 1 for all r>l and pe[2,oo), 

2r 

1 - - < £ wor - B (r, oo) < 1 for all r > 1, 

r 2r 

see Theorem [5] in Section [2.31 

Even though we do not know much about the power function in this case, we can conclude that 
the Hilbert and Banach cases are different since 

r OT - B (r, 2) < r or - u {r, 2) for all r G (i, oo). 

Surprisingly enough, for the randomized setting with the Hilbert case and for the average case 
setting with the Hilbert or Banach case we have complete knowledge about the power function for 
p = 2 due to [28] and [{?]. More precisely, we know that 

r an " H (r, 2) = £ av S" H / B (r, 2) = 1 for all r > 0. 

More estimates of the power function can be found in the subsequent sections. 

2 Worst case setting 

Let F be a Hilbert or Banach space of functions, defined on a set Q, such that the linear functionals 
/ i y f(x) are continuous for all We assume that F C L p and that the embedding I : F — > L p 

is continuous 2 , where 1(f) = f. We write H instead of F if F is a Hilbert space. 

Let (c n ) be a sequence of nonnegative numbers. Assume first that (c n ) converges to zero. We 
define its (polynomial) rate of convergence r(c n ) by 

r(c n ) = sup{ P > I lim c n rfi = 0}. 

n— >oo 

If (c n ) is not convergent to zero, we set r(c n ) = 0. Then r(c ra ) is well defined for all nonnegative 
sequences (c n ). For example, the rate of convergence of n~ Q is max(0, a). 

We approximate functions from F using finitely many arbitrary linear functionals L G F* or 
function values f(x) for some x G O. We define the error of such approximations by taking the 
worst case setting with respect to the L p norm. The norm of L p is denoted by || • 

We define two classes A al1 and A std of information evaluations. We have A std C A al1 = F* 
and A std consists of linear functionals of the form L x (f) = f(x) for all / G F, where x G ft. We 
approximate functions from F by algorithms A n : F —> L p given by 

A n (f) = </> n (L 1 (f),L 2 (f ),..., L n (f)), 



2 We do not specify f2 or the underlying measure of L p since they can be arbitrary. 
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where n is a nonnegative integer, <j) n : R n — > L p is an arbitrary mapping, and Lj G A, where 
A G {A al1 , A std }. The choice of Lj can be adaptive, that is, Lj(-) = Lj(-; L 2 (f), . . . ,Lj_i(/)) 

may depend on the already computed values Li(f), L%(f), . . . , Lj_i(/). For n = 0, A n (f) equals 
some fixed element of the space L p . More details can be found in e.g., |14t I21j. 

Hence, we consider algorithms that use n linear functionals either from the class A std or from 
the class A al1 . We define the minimal errors as follows. 

Definition 1. For n = and n G N := {1, 2, . . . }, let 

e all-wor (i?;Lp):= inf gup 11/ -^(/Jl! 



An with L,eA a11 \\f\\ F <l 



and 

std— wor 



(F,L P ):= inf sup \\f - A n (f)\\ 

A n with Lj-eA^d \\f\\ F <i 

For n = 0, it is easy to see that the best algorithm is = and we obtain 

e all-wor (jF)Lp) = e std-wor (jF)Lp)= gup = gup || J(/)|| p = || J||. 

II/IIf<i II/IIf<i 

This is the initial error that can be achieved without computing any linear functional on the 
functions /. Clearly, 

e all-wor( F) < e std-wor( F) L j for ^ n £ R 

The sequences (e a 1 11_wor (i ? , L p )\ and (e^ td_wor (i ? , L p )) are both non-increasing but not necessarily 
convergent to zero. 

We want to compare the rates of convergence 



r 



all-wor^ r ( all-wor (F; L p )j and r ^-wor ( ^ ^ r / std-wor^ ^ 



In particular, we would like to know if it is possible that the sequence (e a , 11_wor (i ? , L p )) converges 
to zero much faster than the sequence (e^ td_wor (-F, L p y\. In many cases it is much easier to analyze 
the sequence (e all_wor (F, L p )) n ^. It is then natural to ask what can be said about the sequence 

(e std-wor (j p )Lp))neN _ 

The main question addressed in this paper is to find or estimate the power function defined as 
^wor-x . ( 0)OO ) x [l,oo] -»• [0,1] by 

„std— wor I rp t \ 

l wm -*(r,p) := inf 

^ :r .all-wo r (^ jip ) =r . r 

where x G {H, B} and indicates that the infimum is taken over all Hilbert spaces (x = H) or over 
all Banach spaces (x = B) continuously embedded in L p for which function values are continuous 
linear functionals and the rate of convergence is r when we use arbitrary linear functionals. 

It is easy to show, and it will be shown later, that the set of spaces F for which r all - WOT (i? L p ) = r 
is not empty and therefore £ wor_x is well defined. Obviously, £ WOT ~ x (r,p) G [0,1], as already 
claimed. The power function ^ wor_x measures the ratio between the best rates of convergence of 
approximations based on function values over those based on arbitrary linear functionals for a worst 
possible Hilbert or Banach space. 
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We briefly comment on why we take the infimum over F in the definition of the power function. 
For some specific spaces F, standard information is as powerful as linear information 3 . But this is 
a property of F, not the indication of the power of standard information. By taking the infimum 
with respect to F, we concentrate on the power of standard information as compared to the power 
of linear information. 

Suppose now that we take the minimal n = n wor_all / std (e, i 7 , L p ) for which the minimal worst 
case error is e or e ||J||. Assume for simplicity that 



„all— wor 



(F,L p ) 



n 



and 



„std— wor 



n 



r 



for some positive a 

, wor— all 



std— wor 



(F,L p )<r. Then 



n 



£,F,L p ) 



-l/r 



and n 



wor— std. 



Clearly, 



In n wor - all (e,F,L„) 
lim -7- — — - . 

e^o In n WOT - std (s,F,L p ) 



a 



> t 



e > F, L p ) 



\r,p). 



-l/a 



Hence, if l WOI ~ x (r,p) = 1 then function values are as powerful as arbitrary linear functionals. 
On the other hand, the smaller £ WOT ~ x (r,p) the less powerful are function values as compared to 
arbitrary linear functionals. If £ wor ~ x (r,p) = then the polynomial behavior of n all (e, F, L p ) in 
e _1 can be drastically changed for n std (e, F, L p ). 

Remark 1. It is well known that, in some cases, we can restrict ourselves only to linear algorithms. 
This holds when p = oo or when F is a Hilbert space. Then the corresponding infima for the minimal 
worst case errors are attained by 

n 

4,(/)=5>i(/v»i 

3=1 

for some Lj e A € {A std , A al1 } and hj 6 L p . Much more about the existence of linear optimal error 
algorithms can be found in e.g., |14j . 



2.1 Double Hilbert Case 

In this subsection, we consider the approximation problem defined over a Hilbert space with the 
error measured also in the Hilbert space Li- That is why the name of this subsection is the double 
Hilbert case. Approximation in the L2 norm for Hilbert spaces has been studied in many papers. 
For our problem the most relevant papers are [6], [8] and |27| . 

Assume that H is a Hilbert space of functions defined on a set f2. Since we assume that function 
values are continuous this means that H is a reproducing kernel Hilbert space, H = H(K), where 

3 This holds with r = 00 for all finite dimensional spaces F. This also holds for some infinite dimensional Hilbert 
spaces F. For example, take F as the space of piecewise constant functions over, say, Ij := + for 

j — 1,2, ... . The inner product of F is chosen such that the functions e-j equal to 1 over Ij are orthonormal. Then 
the algorithm A n (f) = X}?=i (/i e j)F e i minimizes the worst case error for all L p with p £ [l,oo). The error is 
[n(n + l)] _1 ^ p . Since {f,e.j) F = /(I/O + 1)), we may say that this algorithm uses standard information. Therefore 
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K is defined on x Q. Let Li = L%{£l,\i) be the space of /i-square integrable functions with a 
measure /i on $7. Since the embedding I : H(K) — > L2(^,/i) is continuous, we have 



f |/(t)| 2 d/i(t) < oo for all f £ H(K). 
Jq 



In particular, we can take / = K(-,t) for arbitrary t £ f2, since such a function / belongs to H{K). 
Therefore W = 1*1: H(K) -> H(K), where J* is defined by (g, 1(f)) = (I*(g), f) H (K) for 

all / G H(K) and 5 G £2(^5 /•*)> is given by 

(x) = / if (z, t) f(t) d M (t) for all / G H(K). 
Jn 

The operator is self-adjoint and positive semi-definite. It is well known that 

lime™ or - all (^,L 2 ) = 

n 

if and only if W is compact, see, e.g., [HI Section 4.2.3]. Unfortunately, in general, W needs 
not be compact and therefore e™ OT ~ aa (H, L%) does not have to go to zero. In fact, the sequence 
e" OI_a "(ff,l2) can be an arbitrary non-increasing sequence as the following example shows. 

Example 1 (Arbitrary Sequence e™ or_all (if, L2)). 

Let (a n ) ne N be an arbitrary non-increasing sequence of nonnegative numbers. Define k* as 
the number of positive a n . If all a n are positive, we formally set k* = 00. If A;* is finite let 
N fe . ={1,2,..., k*}, otherwise let N k * = N. 

For k G Nfc*, take arbitrary disjoint intervals I k of positive Lebesgue measure |7fc| such that 
UfcGN fc » Ik = [0, 1], and define the functions e k : [0, 1] — > R by 



efc = — 7j== i 4 > 
Ufc 



where is the indicator function of I k . That is, e k (x) = yo^J\Ik\ for x £ I k and efc(x) = for 
x £ I k . 

Define the Hilbert space H = spanje^ | k £ N k *} equipped with the inner product such that 
{ek,ej) H = 5 k j for all k, j £ This means that H is the space of piecewise constant functions 
/ : [0, 1] -)• R such that 



f = ^2a k e k with a k = (f,e k ) H and = f J 

fc=l ^Jfe=i ' 

The Hilbert space -ff has the reproducing kernel 

k* 

K(x,y) = ^ e k (x)e k (y) for all x,y£[0,l\. 



1/2 

< 00. 



k=l 



Indeed, first of all note that K is well defined since for all x and y the last series has at most one 
nonzero term. Then (K (■ , yi) , K (■ , yj)) H = K(yi,yj), and 



< 



2 



^2a,jK(;yj) = ^2 a i a j K(y i ,y j ). 

j=l H itj=1 
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This shows that the matrix (K (yi,yj))i j=i 2 ... m is symmetric and positive semi-definite for all m 
and yj. Clearly, 

k* 

(f,K(-,y)) H = ^2a k e k (y) = f(y), 

k=l 

and this completes the proof of the fact that K is the reproducing kernel of H. 

Let L2 = L2HO, 1]) be the usual space of square Lebesgue integrable functions. Note that 



\ekh = 7j== ( I dt ) = a k- 



k 



Therefore, for any / G H, we have 



/ k* \ l l 2 

\\iu)h = \\fh= \J2 a l a l) <«iii/ik- 



\k=l 



The last bound is sharp, and therefore ||/|| = a\ showing that H is continuously embedded in Li. 
The operator W takes now the form 



k" 



k=i 

Note that W{eu) = ||efc||| = oi\ek- This means that (a\, e^) are the eigenpairs of W and 



k=i 

It is well known that 



e 



wor-all^ L ^ = for aU n e 



see, e.g., [TU Section 4.2.3]. This proves that the behavior of e% ov - &ll (H, L 2 ) can be arbitrary and, 
in general, we do not have convergence of e™ or-all (-ff, L2) to zero. Clearly, W is compact if and 
only if lim n a n = 0. 

In addition, this example also shows that for a given (3 > we can define a sequence such 
that r all - wor (i? , L 2 ) = /3. Indeed, it is enough to take a k = k~$ . □ 

We discuss the power function ^ wor_H . \y e now assume that r all ~ wor (^T, L2) = r > 0. In 
particular, we assume that the operator W is compact. Then W has eigenpairs (Xj,r]j), 

w (m) = X iVj for all j = 1,2, ... , 

with (i]j,rjk} H = <5j,fc- Without loss of generality, we can order the eigenvalues Xj such that Ai > 
A2 > • • • . For all / G H , we have 

(/, %> 2 = (/(/), /(%)) 2 = (/, W( m )) H = X k (f, Vk ) H . 
In particular, letting / = rjj, we conclude that the functions r]j are also orthogonal in the space 
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As above, it is well known that 

er r " all (#,£ 2 ) = \/Vw for all n G N. 

If (e™ or_all (-ff, L 2 )) is convergent to zero then the same also holds for function values, i.e., 
(e^ or-std (.ff, L 2 )) is also convergent to zero. Indeed, we can reason as in Section 10.4 of |14| that 
all linear functionals can be approximated with an arbitrarily small error when we use function 
values, and then it is enough to remember that the error yAn+i is achieved by a linear algorithm 
that uses the n linear functionals if\Vj) h(K)- 



We have 

00 



2 



trace(W) :=^A i = / K(x, x) dfi(x) = [e™ or - all (#, L 2 ) 
j=l J Q n=0 

and this is finite if r all - wor (#, L 2 ) > \. If r all - wor (F, L 2 ) = \ then J2n=o [e™ r - all (F, L 2 )] 2 may be 
finite or infinite, and if r all - wor (#, L 2 ) < \ then J2n=o [eZ° r ~ &l \H,L 2 )] 2 is infinite. 
The result from [8] states that r all - wor (iJ ) L 2 ) = r >\ implies 

r std - wov (H,L 2 )>r 



2r + 1 2r + 1 

2 



The case X^nLo [ e ™ all (-^>-^2)] = 00 wa s studied in [BJ. It was shown that for any r G [0, |] 
there is a Hilbert space H such that 



„all— wor , 



r 



H, L 2 ) = r and r^'^H, L 2 ) = 0. 



These results give us the following bounds on the power function ^ wor " H (-,2). 
Theorem 1 ([6j[8]). 

^ wor - H (r,2) = for all r G (0, |], 

e wor - n (r,2) G 



2r 



2r + 1' 



2J' 

/or a// r G (7^,00). 



Although we do not know the power function e wor - H (-, 2) exactly, we know that there is a jump 
at i since £ wor ~ H (r,2) > 1/2 for all r > 1/2. Note also that for large r, the values of £ wor " H (r,2) 
are close to 1. This means that the power of function values for r G (0, 1) is zero, and is almost 
optimal for large r. 

The problem of finding the exact values of £ wor_H (r, 2) for r > \ is one of the main open 
problems in the worst case setting. We know that many people, including the two of us, spent a lot 
of time trying to solve this problem but so far in vain. That is why we propose an open problem 
with the hope that it will soon be solved by the reader. 

Open Problem 1. Suppose that r > |. Is it true that 

£ wor ~ H (r,2) = 1? 
If not, what are the values of ^ wor " H (r, 2)? 



E. Novak, H. Wozniakowski 



10 



The rate of convergence neglects to distinguish between sequences that differ by a power of 
logarithms of n. Indeed, for c n = n~ r and b n = n _r [ln (n + l)f for a positive r and an arbitrary j3, 
we have r(c„) = r(b n ) = r independent of /3. Obviously, for some standard spaces, we would like to 
know not only the rate but also a power of logarithms. We discuss this point in the next example, 
where we use the notation 

c n x b n 

which means that there exist positive numbers a\ and a 2 such that a\ < c n /b n < a 2 for large n. 

Example 2 (Sobolev spaces, p = 2). 

a) For the standard Sobolev spaces W-fQO, l] d ) with an arbitrary s > 0, which measures the 
total smoothness of functions, it is well known that 

ef- WOI (Wi([0,l] d ),L 2 ) x n- s l d . 

Of course, in general, function values are not well defined in W$ ([0> 1] )• We must assume the 
embedding condition 2s > d and then function values are well defined and they are continuous 
linear functionals. Furthermore, it is known that 



all— wor 



(Wf([0,l] d ),L 2 ) xef- wor (^|([0,l] d ),L 2 ) xn-*/ d , 



see, e.g., [14J for a survey of such results. 

b) For the Sobolev spaces Ty2' mix ([0, with r > 0, which measures the smoothness of functions 
with respect to each variable, it is known that 



e 



all-wor (w -r,mix ([0) ^ n ~r (k)g ^ (d-l)r ^ 
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see, e.g., [TJ [TTJ [TTl [191 ED [26] , where this result can be found in various generalities. 
For function values, we must assume that r > 1/2, and then the best upper bound is 

e^ d - wor (W 2 r ' mix ([0,l] d ),L 2 ) = O (n-^logn)^- 1 )^ 1 / 2 )) , 

see [H1Q11 E2]. 

It is not known whether this extra power (d — l)/2 of logarithms is needed. It would be very 
interesting to verify whether 

ef - wor (VF 2 r ' mix ([0, l} d ), L 2 ) x e^ td - wor (PF 2 r ' mix ([0, l] d ), L 2 ) 

holds also for this example. □ 

The examples in [B] use very irregular sequences (e^ 11_wor (i7, L 2 )) and hence do not exclude a 
positive answer to the question in the next open problem. 

Open Problem 2. Assume that e^ l ~ wor (H, L 2 ) x n~ r [ln(n+l)]^ with arbitrary r > and fi G R. 
Is it true that this implies 

4 td - wor (#,L 2 ) x ef- wor (H, L 2 )? 
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2.2 Single Hilbert Case 



In this short subsection, we mostly consider the approximation problem defined over a Hilbert 
space with the error measured in the non-Hilbert space L p for p ^ 2. That is why the name of this 
subsection is the single Hilbert case. 

We report on a recent result of Tandetzky |18| who considered the approximation problem for 



arbitrary p G [1, oo). He proved that for any r G (0,min(i,^)] there exists a Hilbert space H 



continuously embedded in L p = L p ([0, 1]) such that 

r all - WOI (H,L p ) = r and 



r 



std— wor 



(H,L p ) = 0. 



This result obviously implies that the power function is zero over (0, min(i, |)]. It seems to us that 
no example is known in the literature for a Hilbert space for which e^ 11 " 



WOT (H, L p ) tends to zero faster 
than the sequence e std ~ wov (H, L p ) with the additional assumption that r all - wor (ff ; L p ) > min(|, |). 

i l\ _\ We 



This implies that we do not know the behavior of the power function over (min(i, 
summarize our partial knowledge of the power function in the following theorem. 



p> 2< 



,oo 



Theorem 2 ([IB]). Let p^2. 

e wor - R (r, P ) 







for all r G (0,min(~, -)]. 



Only for the case p = oo do we know a little more about the behavior of the power function. 
In this case the rates are related as explained in the following theorem. 



Theorem 3 ([12]). Let F be a Hilbert or a Banach space. Then 



„std— wor 



(i^co) < (l + n)e 



all— wor 



(F,Loo) for all n G N. 



(1) 



This inequality follows from Proposition 1.2.5, page 16, in [12] . where it is stated for the 
Kolmogorov widths and also applies to the linear or Gelfand widths. 

The inequality (pQ) cannot be improved even if we assume that F is a Hilbert space. This follows 
from the following example. 



Example 3. Take F = H = 
be identified with / = /2, 
product 

(f,g) H = 

The unit ball of H is thus 

B 

Then for e — > 0, we obtain 



° n+1 . That is, / G if is now defined on {1, 2, . . . , n + 1} and can 
• > fn+i], where fi = f(i). The space H is equipped with the inner 



'n+l 

.i=i 



Vt+l 

,i=l 



n+l 



+ £ J2 fi9i for a11 /> 9 € -ff. 



i=l 



pn+l 



ES/<] +^E"= + i 1 /?<i}. 

std— wor I t? t \ \ 1 



Indeed, knowing /(x^) for i = 1, 2, . . . , n, with x« G {1, 2, . . . , n + 1}, we take / such that f(xi) = 0. 
Since we have at most n conditions on n + l components of / then at least one component of / 
from the unit ball is free and can be taken as ±l/y/l + e. This proves that the worst case error of 
any algorithm is at least + e which in the limit as e goes to zero is 1. 
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Consider the information 

N{f) = [fi-h,f2-h,...Jn-fn+i) for all feH. 

It is known that the minimal error of all algorithms that use iV is the supremum of for / E B 
and N(f) = 0. Observe that N(f) = implies that / = [c, c, . . . , c]. Next, f E B implies that 

c 2 < l + e/(n + l) 



(n + 1) 2 ' 

Hence, again for e — > 0, we obtain ef- wor (F, < l/(n + 1). □ 
Let r all_wor (F, Lqo) = r > 1. Then the inequality ([I]) implies that 

r std - wor (.F,L 00 ) > r-1. 
Thus, Theorem [3] implies the following behavior of the power function for p = oo. 
Theorem 4. 

r — 1 



^ wor - H / B (r,oo) E 



1 



/or aZZ r > 1. 



Hence, for both p = 2 and p = oo, we see that for large r, the power function is almost one. 

We want to guess the behavior of the power function for r > min(|, ^). It can be helpful to see 
the actual rates of convergence for some standard spaces. In particular, for p = oo, the rates are 
known for Sobolev spaces. 

Example 4 (Sobolev spaces, p = oo). 

a) For the Sobolev spaces W% ([0, l] d ) and an arbitrary s for which 2s > d, it is well known that 

ef - wor (W|([0, l] d ), Loo) ~ < td ~ wor (^|([0, l] d ), Loo) x n~ s / d+1 l\ 

see, e.g., [H]. 

b) For the Sobolev spaces W2 ,mix ([0, l] d ) with s > 1/2, it is known that 

e all-wor (Ty s,mix ([0) ^ e^-wor^mix^ X n"^ 1 / 2 (log fl) ^- 1 ) S , 

see EDI. □ 



Hence, at least for the standard Sobolev spaces the rates are the same even up to logarithmic 
factors. This again suggests that the power function can be just one for all r > (min(^, |), oo). 
This is the next open problem. 

Open Problem 3. Verify whether it is true that for all p £ [l,oo] we have 



for all r E (0, min(|, ±)] , 

1 for all r E ( min(i, i), oo) . 



£ WOT ~ R (r,p) = 

We end this section with a remark on the rates of convergence for different p. 
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Remark 2. It is interesting to compare the sequences 

ef- wor {H,L p ) and/or e^ wor (H, L p ) 

for the same H but different p. The following example shows that, in general, there exists no 
relation between these sequences. Some relations do exist as shown in [7] but under some additional 
assumptions about H. The following example shows that some assumptions on H are indeed needed, 
otherwise everything can happen. 

Take L 2 = L 2 ([0, 1]), L^ = L oo ([0, 1]) and assume that [0, 1] is the disjoint union of intervals 
Ik of positive length A& such that X^fcLi = 1- Assume also that 

Ai > A 2 > • • • 

and put ek = l/ fc . We define a Hilbert space H by its unit ball 

00 „2 



where 



^fe=i fc=i rfc j 



7i > 72 > • • • > with lim 7^ = 0. 

fc— >oo 



Hence for / = Yl'kLi a k^k £ H, we obtain 



00 2 00 

lif = E~~i and II/II2 = E a fc Afc ' ll/lloo = sup|a fc |. 
fc=i ^ fc fc=i k 

From this, we easily conclude that the optimal approximation for L 2 as well as for Loo is given by 



fe=l fc=i 

Note that 

afc = (f,e k ) H = f(x k ) Afc, 

where S This means that the optimal error algorithm for function values and linear func- 
tionals is the same, and therefore 

e all - WOI {H,L p ) = e std - WOI {H,L p ) for pG{2,oo}. 

However, 

ef ~ wm (H, Loo) = 7n+i and ef~™{H, L 2 ) = 7n+1 y/\^. 
Since {7 n } and {A n } are not related, it is easy to get an example with 

r all ~ WOI (H, Loo) = but r all - wor (F, L 2 ) = 00. 

Hence, in general, the difference between the minimal rates for L 2 and Loo approximation can be 
extreme. 
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2.3 Banach Case 

In this subsection, we study the approximation problem defined over a Banach space that is contin- 
uously embedded in L p . As always, we assume that function evaluations are continuous functionals. 
We establish some bounds on the power functions by recalling known results for Sobolev spaces. 

Example 5 (Sobolev spaces, 1 < p < oo). 

For the Sobolev space Wi([0, l] d ) for an arbitrary s > 0, it is known that 

ef-»(W;([0,lf),L P )^^- 

Function values are well defined in W£ ([0, l] d ) only if the embedding condition s/d > 1/p or s = d 
and p = 1 holds. However, we may use the approach suggested in [2] that allows us to consider the 
case without this embedding condition. Namely, we limit ourselves only to continuous functions by 
taking 

F = % s ([o,i]>C([o,if) 

with norm 

\\f\\F = ll/llw p s ([o,i] d ) + ll/lb([o,i] d )- 

Here, C{[0,l] d ) is the space of continuous functions equipped with the max norm. Then F is a 
Banach space for which function values are well defined and function values are continuous linear 
functionals on this space. Then for s/d < 1/p and s/d < 1 in the case p = 1, respectively, it was 
shown in [2] that 

_std— wor / 771 T \ ^ * 1 

□ 



e% d ~ WOT (F,L p ) X 1. 



The last example implies that 

£ wor - B (r,p) = for all re(0,l/p] and 1< p < oo, 
£ wor " B (r,l) = for all r G (0, 1). 

We now show that £ wov - B (r,p) = over larger domains of r for a given p by recalling other 
results for Sobolev spaces. 

Example 6 (Sobolev space Wf([0,l] d ), 1 <p<oo). 

Consider the approximation problem for the Sobolev space 

Wf([0,l} d ) with error measured in 
L p = L p ([0, l] d ). This problem is well defined and convergent for the class A al1 if we assume that 
s/d > 1 - l/p. 

For p £ [1, 2], we have 

ef - wor (^([0, l] d ), L p ) ^ ^ s/d 



71 



whereas for p £ [2, oo), we have 

ef- wov (Wf{[0, l] d ),L p ) x „-'/*-i/2-i/p 

see e.g., |24| . The last relation also holds for p = oo as will be needed later. 

The same results are also valid for the space F = Wf([0, l] d ) n C([0, l] d ) with the norm 



\f — \\t\\wi([o,i] d ) + ll/lb([o,i] d )- 
For the space F, we can consider function values for all s/d > 1 — 1/p. For s/d < 1, we have 



□ 
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Let p G [1, 2]. The previous example implies that 

£ WOT - B (r,p) =0 for all r G ( 1 - -, 1 

V P 

For p G (1, 2], we showed before that £ wor " B (r,p) = for all r G (0, l/p\. Since (0, l/p]U(l-l/p, 1] 
(0,1], we obtain 

e wor ' B (r,p) = for all r G (0, 1] and p G [1, 2]. 
Let p G [2, oo). The previous example implies that 

£ wor " B (r,p) =0 for all r G [ — , — + — 

V 2 2 P. 

Now we show that £ wor_B (r,p) = also for p G [2, oo) and r G (0, 5]. We increase the space 
F = H^([0, 1]) n C([0, 1]) with the norm 



W?([0,1}) + ll/llc([0,l]) 

(for d = 1) even more by adding functions from a Holder class C a , where < a < 1/2. Hence we 
take the space 

F = F + C a 

with the norm 

\\fy :=inf{|| ff || F + \\h\\c° I f = g + h,geF, heC a }. 

Since the unit ball of F is larger than that of F we still have e^ td_wor (F, L p ) x 1 for s < 1. It 
is well known that e^ u - wor (C a ([0, 1]),L P ) x n" a and the same holds for F if a < s - 1/2 + 1/p. 
Hence for p G [2, 00), we obtain 

£ wov - B (r,p) = for all r G ( 0, - + - 

V 2 p 

We learnt some properties of the power function by using known results for Sobolev spaces 
W£([0,l] d ) in the case s/d < 1/pi so that function values did not even supply convergence. Since 
we needed to assume that s/d > l/pi — 1/p, the case p = 00 could not be covered. 

We now recall some results for Sobolev spaces when the embedding condition is satisfied and 
when there is a difference in the convergence rates between function values and arbitrary linear 
functionals. 

Example 7 (Sobolev space 

Wf{[0,l] d ), 1 < P < 00). 
Consider the approximation problem for the Sobolev space W^QO, l] d ) with error measured in 
L p . We now assume that s/d > 1. Then function values are well defined and are continuous linear 
functionals. Furthermore, 

e s * d - WOT (W(([0, l] d ),L p ) x n -*A*+i-i/p 
see, e.g., the survey of such results in Section 4.2.4 of [14] or [23j[24]. □ 
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The last two examples imply the following estimates of the power function. For all r > 1 and 
p G [1, 2], we have 

£ WOI - B (r,p) < 1- - ( 1- -V 
r \ p J 

and for all r > 1 and p G [2, oo], we have 

1 



1 wor— B 



(r,p)<l ^ 



We summarize the properties of the power function established in this section in the following 
theorem. The only case where we have a positive lower bound is the case p = oo, see Theorem 2J 

Theorem 5. 

r or - B (r,p) = for all r G (0, 1] and pG [1,2], 

£ wor_B (r,p) = /or a// r € (0, § + |] and pG (2, oo), 

£ wor " B (r,p) < 1--M--] for all r > 1 and pG [1,2], 
r \ p J 

£ wor " B (r,p) < 1-— forall r> \ and pG [2, oo), 

1 _ I < £ wor - B ( r , oo) < 1 - — /or all r > 1. 

r 2r 

It is interesting to note that although we do not know the exact values of the power functions 
in the Hilbert and Banach cases, we can check that they are different at least for p = 2. Indeed, 
from Theorems 1 and 5, we have 

^ wor - B (r,2) = ^ wor - H (r,2) forall rG(0,±], 

£ WOT - B (r, 2) = < \ < £ wor - H (r, 2) for all r G (±, 1], 

1 2r 

l wov - B (r, 2) < 1 < < £ wor - n (r, 2) for all r G (1, oo). 

This shows that at least for p = 2 the power of function values for the Hilbert case is larger than 
for the Banach case for all r > ^. 

Obviously, it would be desirable to find the exact values of the power function £ wor ~~ B (r,p) for 
all r G (0, oo) and p G [1, oo]. However, it could be a very difficult problem. Hence, as maybe a less 
difficult problem, we would like to check the following property of the power function. 

Open Problem 4. For p G [1, oo], find the supremum a*(p) of a for which 

£ WOI ~ B {r,p) = for all r G (0, a]. 
We only know that a*(p) > 1 for all p G [l,oo). 

We already indicated that the power functions for the Hilbert and Banach cases are different 
for p = 2. It would be of interest to check if this holds for all p. 

Open Problem 5. Find all p G [l,oo] for which 

£ wm ~ B {-,p) ^£ wov ~ r (-,p). 
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Similar to Example [3] we present an example of a Banach space F where the ratio 

„std— wor f TP T \ 



all— wor 

e-a 



(F,L p 



is large for p > 1 and a fixed n. 



Example 8. Take F 



nn+l 



i.e., F 



pn+l 



with the l\ norm. Then we obtain 



„std— wor 



(F,L p ) = (n + l) 1 - 1 /f e f-wor (F;Lp)) 



(2) 



since e^ td_wor (F, L p ) = 1 and e^ 11_wor (F, L p ) = (n+l) 1/p_1 . The upper bound in the last statement 
follows again with the information N(x) = (x2 — xi,xs — x 2 , • • • , x n+ \ — x n ) while the lower bound 
follows from the fact that the unit ball of contains a i p +1 ball of radius (n + 

Again this ratio (n + 1) 1_1 /p as in ([2]) can be obtained with a Hilbert space and actually we can 
take the same spaces as in Example O i.e., we define in H = IR™" 1 " 1 the scalar product 



'n+l 

E 

.i=i 



fi 



n+l 

.1=1 



n+l 



1=1 



and consider the limit where e > tends to zero. 

We end this section with another open problem. 

Open Problem 6. Find the supremum of ef 1 -" 01 ^, L p )/e^ wor 
Hilbert spaces. So far, we know that 



□ 



(F,L p ) over all Banach and/or 



Q std— wor 



sup 



all-wor 



(F,L P 



> (n+l) 1 - 1 ^, 



(3) 



and equality holds if p = oo. 



3 Randomized setting 

We approximate the embedding operator / : F — > L p in the randomized setting. We now briefly 
define this setting. The reader may find more on this subject, e.g., in [14 } 115 } 151]. 

We approximate I by algorithms A n that use n values of linear functionals on the average and 
each linear functional is chosen randomly with respect to a probability distribution. 

More precisely, the algorithm A n is of the following form 

A n (f, u) = <f) n ,u (Li, Ui (/), L 2 ,u 2 (/),..., L n (u),w„ (u) (/)) , ( 4 ) 

and the number n(cu) of functionals can also be random. Here oj = [cji, u)2, ■ ■ ■ ], and the linear 
functionals Lj jUj are random functionals distributed according to a probability distribution on 
elements ujj which may depend on j as well as on the values already computed, i.e., on Li >LJi (f) for 
% = 1, 2, . . . ,j — 1. The mapping </> n ,w : — >• L p is a random mapping, and 

E w n(uj) < n. 
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We also allow adaptive choices of the functionals Lj U .. That is, Lj w . may depend on the already 
selected functionals and the values L\ tWX (f), £2,0*2 (/),..., 

Without loss of generality, we assume that A n (f, •) is measurable, and define the randomized 
error of A n as 

e^(A n )= sup (Ejl(f)-A n (f,u)\\ 2 p ) 1/2 . 

II/IIf<i 

Again, we compare such algorithms with algorithms that are based on function values, i.e., each 
Lj }Uj is now of the form Lj >Uj (f) = f(tj^) and 

Hence, we consider algorithms that use n linear functionals either from the class A std or the 
class A al1 . We define the minimal errors as follows. 

Definition 2. For n G No, let 

ef ~ ran (F, L p ) = inf {e ran (^ n ) | Lj G A al1 and A n as in ® } , 

and 



std— ran 



(F, L p ) = inf |e ran (^ n ) | Lj G A std and A n as in ©} . 



As in the worst case setting, for n = it is easy to see that the best algorithm is Aq = and 
obtain 

e all - an (F,L p ) = e s td - ran (F,L p )= sup ||/|| p = sup ||/(/)|| p = ||/||. 

\\f\\ F <i II/IIf<i 

This is the initial error that can be achieved without computing any linear functional on the 
functions /. Clearly, 

e all-ran (F) L j < e ^ d " ran (F, L p ) for all n G N. 

The sequences (e all_ran (i ? , L p )) and (e^ d ~ rail (-F, L p )) are both non-increasing but not necessarily 
convergent to zero. 

As in the worst case setting, we want to compare the rates of convergence 

r all-ran (F) ^ = f /aJl-raii^ L j\ and ^td-ran^ ^ = r ^std-ran ^ ^ _ 

In particular, we would like to know if it is possible that the sequence (r all_ran (i ? , L p )) converges 
much faster than the sequence (r std_ran (i ? , L p )) . The main question addressed in this section is to 
find or estimate the power function defined as £ ran ~ x : (0, oo) x [l,oo] — > [0, 1] by 

std-ran/p r \ 

£™-*(r,p) := inf ^sl, 

F:r all -™(F,Lj,)=r T 

where x G {H, B} indicates that the infimum is taken over all Hilbert spaces (x = H ) or over all 
Banach spaces (x = B) continuously embedded in L p and the rate of convergence is r when we use 
arbitrary linear functionals. In the randomized setting, we do not need to assume that function 
values are continuous linear functionals. 
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3.1 Double Hilbert Case 

In this subsection, we consider the approximation problem defined over a Hilbert space with the 
error measured also in the Hilbert space L<i- It may be surprising but the results in the double 
Hilbert case are complete due to [28], and there is no need to discuss different cases depending on 
the values of r. 

Theorem 6 (|28j). Let I : H —> -Z^^) be a continuous embedding from a Hilbert space H into 
Li2(£l). Then 

r all-ran^ ^ = r std-ran^ ^ 

Therefore 

£ran-H^ 2 ) = 1 for all r>0. 

We add that it was known before, see [131 122]) that also 

r all - ran (#,L 2 ) = r all - WOT (H,L 2 ). 

This means that the power of function values in the randomized setting is the same as the power of 
arbitrary linear functionals in the worst case setting, which in turn is the same as in the randomized 
setting. 



3.2 Other Cases 

For p > 2, we know examples from the literature where the rate r all-ran (.£f, L p ) is larger than the 
rate r std ~ ran (H , L p ) . Namely take I : W 2 r ([0,l]) -»• L P ([0,1}). Then with A al1 one can achieve the 
order n~ r (with additional log terms in the case p = oo, but the order is still r), see [ID]. For A std 
the optimal order is n ^ T+1 / 2 ^ 1 /P, see [2\. The authors of [21 [TDJ studied the case of integer r, but 
the results can be extended via interpolation to all r > 1. Therefore, we obtain 

£™ n - H {r,p) < r ~ 1 / 2 + l /P if r > i and p > 2 . 

r 

We summarize these estimates of the power function in the following theorem. 
Theorem 7. Letp>2. Then 

e ian - H (r,p) < 1 - ^ 2 - for all r > 1. 

Sobolev embeddings in the randomized setting were studied by several authors, including [2, 3, 
HI [TDJ Q21 EH [23] . For our purpose, the most important papers are [2J[ID] and the paper [3] for the 
interpolation argument. 

For the embedding / : I / FJ([0, 1]) — >• LqoQO, 1]) the rate is improved by 1/2 if we switch from the 
class A std to the class A al1 . This gap of 1/2 is the largest possible under some additional conditions, 
see [7][9]. Let us add in passing that the same gap of 1/2 appears for A al1 between the worst case 
and the randomized setting. 

The Hilbert case for p G [1>2) as well as the Banach case for all p G [1, oo] have not yet been 
studied. We pose this as an open problem. 
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Open Problem 7. Study the power function in the randomized setting for the Hilbert case with 
p € [1, 2) and for the Banach case for all p E [1, oo]. In particular, determine the supremum a*{p) 
of a for which 

r an - H / B (r,p) = for all r€(0,o]. 
4 Average case setting with a Gaussian measure 

In the average case setting, we assume that I : F — >■ L p (£l) is continuously embedded and function 
evaluations are continuous functionals on F. As far as we know, only the case p = 2 was studied 
and we report the known results from [5] for this case. 

We assume that F is a separable Hilbert/Banach space equipped with a zero mean Gaussian 
measure [i. As in the worst case setting, we consider deterministic algorithms, and due to general 
results, see [21j . it is enough to compare linear algorithms 

n n 

A n{f) = ^L k (f)g k and A n (f) = ^f{x k )g k , 
k=l k=l 

where g k 6 L2(0). The average case error of an algorithm is defined by 

=-%4) := ^jjf-A(f)\\ld^f)- 1/P 

As in the other settings, we define the minimal nth average case errors en 1_avg (.F, L p ), en td_avg (i ? , L p ) 
and the power function £ av e- H / B . That is, for 

we have 

std-avg/77 j \ 

i avs ~ x (r,p) := inf \El±El. 

F . r all-avg( FiLp ) =r r 

As always, x G {H, B} and we take the infimum over separable Hilbert (x = H) or Banach (x = B) 
spaces equipped with zero mean Gaussian measures that are continuously embedded in L p and for 
which function values are continuous linear functionals as well as the rate of convergence is r when 
arbitrary linear functionals are used. 

As already mentioned, results are known only for p = 2. Then the cases of the Hilbert and 
Banach spaces are the same due to the presence of Gaussian measures. This follows from the fact 
that even if F is a separable Banach space then the minimal errors for the class A al1 depend on the 
Gaussian measure v = given by 

v(M)=n({feF\ 1(f) eM}} 

for a Borel set M of L2. The measure v is also a zero mean Gaussian measure whose covariance 
operator C v : L2 — > L2 is given by 

(Cufi,f 2 ) La = I (/,/l) i2 (/,/2> L2 <M/) for all /i,/ 2 GL 2 . 
Jl 2 
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The operator C„ is self adjoint, positive semi-definite, compact and has a finite trace. That is, its 
ordered eigenvalues Xj have a finite sum. It is known that 

/ oo v 1/2 

ef-^(F,L 2 )=( £ XA . 

As in the randomized setting for the double Hilbert space, the results on the power function 
are complete and there is no need to discuss different cases of r. 

Theorem 8 Q5J). Let I : F — > L 2 {Q) be a continuous embedding from a separable Banach space F 
equipped with a zero mean Gaussian measure \x into L 2 (Q). Then 



„all— avp 



r 



(F,L 2 )=r std -^(F,L 2 ). 



Therefore 

^ avg " H / B (r,2) = 1 for all r > 0. 

Of course it would be interesting to study the power function for other values of p. This is 
posed as our last open problem. 

Open Problem 8. Study the power function in the average case setting for p ^ 2. In particular, 
verify whether a similar result as Theorem 8 holds. 
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